Generating Chinese Couplets using a Statistical MT Approach
نویسندگان
چکیده
Part of the unique cultural heritage of China is the game of Chinese couplets (duìlián). One person challenges the other person with a sentence (first sentence). The other person then replies with a sentence (second sentence) equal in length and word segmentation, in a way that corresponding words in the two sentences match each other by obeying certain constraints on semantic, syntactic, and lexical relatedness. This task is viewed as a difficult problem in AI and has not been explored in the research community. In this paper, we regard this task as a kind of machine translation process. We present a phrase-based SMT approach to generate the second sentence. First, the system takes as input the first sentence, and generates as output an N-best list of proposed second sentences, using a phrase-based SMT decoder. Then, a set of filters is used to remove candidates violating linguistic constraints. Finally, a Ranking SVM is applied to rerank the candidates. A comprehensive evaluation, using both human judgments and BLEU scores, has been conducted, and the results demonstrate that this approach is very successful.
منابع مشابه
Generating Chinese Couplets and Quatrain Using a Statistical Approach
We propose a novel statistical approach to automatically generate Chinese couplets and Chinese poetry. For Chinese couplets, the system takes as input the first sentence and generates as output an N-best list of second sentences using a phrase-based SMT model. A comprehensive evaluation using both human judgments and BLEU scores has been conducted and the results demonstrate that this approach ...
متن کاملBayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation
Words in Chinese text are not naturally separated by delimiters, which poses a challenge to standard machine translation (MT) systems. In MT, the widely used approach is to apply a Chinese word segmenter trained from manually annotated data, using a fixed lexicon. Such word segmentation is not necessarily optimal for translation. We propose a Bayesian semi-supervised Chinese word segmentation m...
متن کاملBagging and Boosting statistical machine translation systems
a r t i c l e i n f o a b s t r a c t In this article we address the issue of generating diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Unlike traditional approaches, we do not resort to multiple structurally different SMT systems, but instead directly learn a strong SMT system from a single translation engine in a principled w...
متن کاملMinimum Bayes-Risk Techniques in Automatic Speech Recognition and Statistical Machine Translation
Automatic Speech Recognition (ASR) and Machine Translation (MT) are fundamental language technologies that are emerging as core components of information processing systems. Each of these problems can be evaluated using a variety of metrics that measure different aspects of recognition or translation performance. In contrast, the training and decoding architectures of most of the current ASR an...
متن کاملTo Swap or Not to Swap? Exploiting Dependency Word Pairs for Reordering in Statistical Machine Translation
Reordering poses a major challenge in machine translation (MT) between two languages with significant differences in word order. In this paper, we present a novel reordering approach utilizing sparse features based on dependency word pairs. Each instance of these features captures whether two words, which are related by a dependency link in the source sentence dependency parse tree, follow the ...
متن کامل